Scalable Reliability Modelling of RAID Storage Subsystems

نویسندگان

Prasenjit Karmakar

K. Gopinath

چکیده

Reliability modelling of RAID storage systems with its various components such as RAID controllers, enclosures, expanders, interconnects and disks is important from a storage system designer’s point of view. A model that can express all the failure characteristics of the whole RAID storage system can be used to evaluate design choices, perform cost reliability trade-offs and conduct sensitivity analyses. However, including such details makes the computational models of reliability quickly infeasible. We present a CTMC reliability model for RAID storage systems that scales to much larger systems than heretofore reported and we try to model all the components as accurately as possible. We use several state-space reduction techniques at the user level, such as aggregating all in-series components and hierarchical decomposition, to reduce the size of our model. To automate computation of reliability, we use the PRISM model checker as a CTMC solver where appropriate. We use both variations of PMC − numerical as well as statistical model checking according to the size of our model. Our modelling techniques using PRISM are more practical (in both time and effort) compared to previously reported Monte-Carlo simulation techniques. Our model for RAID storage systems (that includes, for example, disks, expanders, enclosures) uses Weibull distributions for disks and, where appropriate, correlated failure modes for disks, while we use exponential distributions with independent failure modes for all other components. To use the CTMC solver, we approximate the Weibull distribution for a disk using sum of exponentials and we confirm that this model gives results that are in reasonably good agreement with those from the sequential Monte Carlo simulation methods for RAID disk subsystems reported in literature earlier. Using a combination of scalable techniques, we are able to model and compute reliability for fairly large configurations with upto 600 disks using this model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reliability Modelling of Whole RAID Storage Subsystems

متن کامل

ECKD CAID and RAID: When is the Right Time to Write?

Traditional dual copy, CAID, and RAID DASD subsystems can offer improved data reliability in cases of actuator and/or media failures. However, these schemes impose a write penalty for the extra I/Os required to maintain image copies or parity information for data contained in the subsystem. In this paper, we will employ a review of the 3990-3/6 read and write data flows as a basis for discussin...

متن کامل

RAID0.5: Active Data Replication for Low Cost Disk Array Data Protection

RAID has long been established as an effective way to provide highly reliable as well as high-performance disk subsystems. However, reliability in RAID systems comes at the cost of extra disks. In this paper, we describe a mechanism that we have termed RAID0.5 that enables striped disks with very high data reliability but low disk cost. We take advantage of the fact that most disk systems use b...

متن کامل

DiskReduce: Replication as a Prelude to Erasure Coding in Data-Intensive Scalable Computing

The first generation of Data-Intensive Scalable Computing file systems such as Google File System and Hadoop Distributed File System employed n (n ≥ 3) replications for high data reliability, therefore delivering users only about 1/n of the total storage capacity of the raw disks. This paper presents DiskReduce, a framework integrating RAID into these replicated storage systems to significantly...

متن کامل

DiskReduce: RAID for Data-Intensive Scalable Computing (CMU-PDL-09-112)

Data-intensive file systems, developed for Internet services and popular in cloud computing, provide high reliability and availability by replicating data, typically three copies of everything. Alternatively high performance computing, which has comparable scale, and smaller scale enterprise storage systems get similar tolerance for multiple failures from lower overhead erasure encoding, or RAI...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1508.02055 شماره

صفحات -

تاریخ انتشار 2015

Scalable Reliability Modelling of RAID Storage Subsystems

نویسندگان

چکیده

منابع مشابه

Reliability Modelling of Whole RAID Storage Subsystems

ECKD CAID and RAID: When is the Right Time to Write?

RAID0.5: Active Data Replication for Low Cost Disk Array Data Protection

DiskReduce: Replication as a Prelude to Erasure Coding in Data-Intensive Scalable Computing

DiskReduce: RAID for Data-Intensive Scalable Computing (CMU-PDL-09-112)

عنوان ژورنال:

اشتراک گذاری